Goto

Collaborating Authors

 model architecture






Label Poisoning is All You Need

Neural Information Processing Systems

In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks. However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels?



Appendix 545 A Details of datasets and architectures 546 A.1 Object Detection Image Dataset

Neural Information Processing Systems

We evaluate our method on three well-known model architectures:, i.e., SSD [ Named Entity Recognition, and Question Answering. Find more details in Table 5. Recall, ROC-AUC, and Average Scanning Overheads for each model. A value of 1 indicates perfect classification, while a value of 0.5 indicates To the best of our knowledge, there is no existing detection methods for object detection models. We evaluate the IoU threshold used to calculate the ASR of inverted triggers. However, a threshold of 0.7 tends to degrade the Different score thresholds are tested when computing the ASR of inverted triggers.